Spring Data JPA 百万级数据量动态分页查询的优化

您所在的位置：网站首页 › spring data jpa 分页查询 › Spring Data JPA 百万级数据量动态分页查询的优化

Spring Data JPA 百万级数据量动态分页查询的优化

2023-08-15 06:35| 来源: 网络整理| 查看: 265

前言

分页查询是业务中再常见不过的操作了，在数据量比较小，索引使用得当的情况下，一般的动态查询都没啥性能问题。然而当数据量比较达到百万，千万级，常规的分页查询一般都会出现性能问题。本文不会介绍什么分库分表，缓存之类的优化方案，这些东西在网上千篇一律，不值得在此处拿来讨论。本文将通过具体的案例来讲讲当数据量到达百万量级后，分页到底该怎么做，代码该怎么写。

常规查询分页优化

JPA提供的PagingAndSortingRepository接口可以很方便的为我们实现分页，我们只需要继承这个接口或者其子接口JpaRepository就可以实现分页操作。

先看个简单的例子，下面是个无任何查询参数的分页。

public interface AuthorsRepository extends JpaRepository { } @Service public class AuthorsQueryService { private final AuthorsRepository authorsRepository; public AuthorsQueryService(AuthorsRepository authorsRepository) { this.authorsRepository = authorsRepository; } public Page queryPage(Integer pageNo, Integer pageSize) { return authorsRepository.findAll (PageRequest.of (pageNo, pageSize)); } }

当前的测试数据集有270多万，看看这个查询大概会多长时间呢？在单元测试中执行以下代码：

long t1 = System.currentTimeMillis (); Page page = authorsQueryService.queryPage (1,10); long t2 = System.currentTimeMillis (); System.out.println ("page query cost time : " + (t2-t1));

控制台输出：

可以看出，总共耗时1.2s。这个查询已经很慢了，如果算上浏览器传输的时间消耗，时间会更长。对于商业网站来说，页面停顿超过1s，用户大概率会关闭。

当然这个查询也不是没有优化的办法，我们把控制台打印的两条SQL放到Navicat中跑一下，就可以发现，时间基本都用在了第二条统计总量的sql上了，统计总量是为了计算总页数。

所以，优化分页查询的第一个方案：

避免总量统计

对于一些不需要展示总页数的场景来说，这个方案再合适不过了。

JPA提供了返回Slice类型的对象来避免分页时统计总数，我们只需要在dao层增加一个返回Slice的方法就好了：

public interface AuthorsRepository extends JpaRepository { Slice findAllBy(Pageable pageable); }

在Service中增加：

public Slice querySlice(Integer pageNo, Integer pageSize) { return authorsRepository.findAllBy (PageRequest.of (pageNo, pageSize)); }

在单元测试中增加代码：

long t2 = System.currentTimeMillis (); Slice slice = authorsQueryService.querySlice (1,10); long t3 = System.currentTimeMillis (); System.out.println ("slice query cost time : " + (t3-t2));

通过控制台可以发现，Slice 确实避免了做分页查询的总量统计，它只用了32ms。

这里Slice的返回实际上是SliceImpl对象，虽然它不再提供总量和总页数，但我们可以根据 hashNext 属性来判断是否有下一页。在这里插入图片描述这里的分页比较简单，如果是复杂条件动态查询的场景呢？

动态查询分页优化

动态查询简单来说若某个字段存在，则用上这个字段作为查询条件，反之忽略。JPA提供了动态查询的接口JpaSpecificationExecutor用来实现这类动态拼SQL的操作。我们的dao层接口只需要继承它即可：

public interface AuthorsRepository extends JpaRepository, JpaSpecificationExecutor { Slice findAllBy(Pageable pageable); }

Service增加代码如下，这是个非常简单的动态查询，如果fistName字段有值，则进行like左前缀匹配，如果lastName或者email有值则进行相等匹配。

public Slice dynamicQuery(Authors authors, Integer pageNo, Integer pageSize) { return authorsRepository.findAll ((Specification) (root, query, criteriaBuilder) -> { List list = new ArrayList (); if (authors.getFirstName () != null && !authors.getFirstName ().trim ().isEmpty ()) { list.add(criteriaBuilder .like (root.get("firstName").as(String.class), authors.getFirstName ()+"%")); } if (authors.getLastName () != null && !authors.getLastName ().trim ().isEmpty ()) { list.add(criteriaBuilder .equal(root.get("lastName").as(String.class), authors.getLastName ())); } if (authors.getEmail () != null && !authors.getEmail ().trim ().isEmpty ()) { list.add(criteriaBuilder .equal(root.get("email").as(String.class), authors.getEmail ())); } Predicate[] p = new Predicate[list.size()]; return criteriaBuilder.and(list.toArray(p)); }, PageRequest.of (pageNo, pageSize)); }

单元测试中增加测试代码：

Authors queryDto = new Authors (); queryDto.setFirstName ("A"); queryDto.setLastName ("Bosco"); queryDto.setEmail ("[email protected]"); long t4 = System.currentTimeMillis (); Slice authorsSlice = authorsQueryService.dynamicQuery (queryDto, 1, 10); long t5 = System.currentTimeMillis (); System.out.println ("dynamic query cost time :" + (t5-t4));

观察控制台的打印：

Hibernate: select authors0_.id as id1_0_, authors0_.added as added2_0_, authors0_.birthdate as birthdat3_0_, authors0_.email as email4_0_, authors0_.first_name as first_na5_0_, authors0_.last_name as last_nam6_0_ from authors authors0_ where (authors0_.first_name like ?) and authors0_.last_name=? and authors0_.email=? limit ?, ? Hibernate: select count(authors0_.id) as col_0_0_ from authors authors0_ where (authors0_.first_name like ?) and authors0_.last_name=? and authors0_.email=? dynamic query cost time :1025

虽然总共耗时大概1s，但是这里有个比较明显的问题：

即使接口声明了返回Slice，但底层还是执行了总量统计

通过debugger查看上面 authorsSlice 的具体实现，可以看出它竟然是PageImpl，而非SliceImpl！在这里插入图片描述回归源码，可以看出Page实际上是Slice的子接口，而真正实现无总量统计的分页对象实际上是SliceImpl对象。此处，使用 JpaSpecificationExecutor 接口尽管定义了方法返回类型为Slice，但查询依然返回PageImpe，导致分页仍然统计了总量。

进入源码分析，以下为JpaSpecificationExecutor#findAll方法源码: 在这里插入图片描述由于我们传入了分页参数，所以进入readPage方法：通过红框部分可以看出readPage方法一定会执行总量统计。

虽然底层写死了一定会执行总量统计，但是这个方法的访问修饰符是protected，JPA的作者似乎在告诉我们，你要是对这个方法不满意，那就重写它！所以，动态分页的优化核心在于：

重写 readPage 方法

这里的重写也不复杂，只需要去掉executeCountQuery ，然后拼装PageImpl对象即可。

我们定义了静态内部类SimpleJpaNoCountRepository继承SimpleJpaRepository，readPage方法改写分页实现，然后再提供一个findAll方法作为入口，通过调用子类的findAll，那么readPage方法也就会走子类的方法，从而避免分页的总量统计。

@Repository public class CriteriaNoCountDao { @PersistenceContext protected EntityManager em; public Slice findAll(final Specification spec, final Pageable pageable, final Class domainClass) { final SimpleJpaNoCountRepository noCountDao = new SimpleJpaNoCountRepository (domainClass, em); return noCountDao.findAll (spec, pageable); } /** * Custom repository type that disable count query. */ public static class SimpleJpaNoCountRepository extends SimpleJpaRepository { public SimpleJpaNoCountRepository(Class domainClass, EntityManager em) { super (domainClass, em); } @Override protected Page readPage(TypedQuery query, Class domainClass, Pageable pageable, Specification spec) { query.setFirstResult ((int) pageable.getOffset ()); query.setMaxResults (pageable.getPageSize ()); final List content = query.getResultList (); return new PageImpl (content, pageable, content.size ()); } } }

在Service中增加调用：

public Slice noPagingDynamicQuery(Authors authors, Integer pageNo, Integer pageSize) { return noCountPagingRepository.findAll ((Specification) (root, query, criteriaBuilder) -> { List list = new ArrayList (); if (authors.getFirstName () != null && !authors.getFirstName ().trim ().isEmpty ()) { list.add(criteriaBuilder .like (root.get("firstName").as(String.class), authors.getFirstName ()+"%")); } if (authors.getLastName () != null && !authors.getLastName ().trim ().isEmpty ()) { list.add(criteriaBuilder .equal(root.get("lastName").as(String.class), authors.getLastName ())); } if (authors.getEmail () != null && !authors.getEmail ().trim ().isEmpty ()) { list.add(criteriaBuilder .equal(root.get("email").as(String.class), authors.getEmail ())); } Predicate[] p = new Predicate[list.size()]; return criteriaBuilder.and(list.toArray(p)); }, PageRequest.of (pageNo, pageSize), Authors.class); }

单元测试及控制台输出：

long t5 = System.currentTimeMillis (); Slice authorsSlice = authorsQueryService.noPagingDynamicQuery (queryDto, 1, 10); long t6 = System.currentTimeMillis (); System.out.println ("no paging dynamic query cost time :" + (t6-t5)); Hibernate: select authors0_.id as id1_0_, authors0_.added as added2_0_, authors0_.birthdate as birthdat3_0_, authors0_.email as email4_0_, authors0_.first_name as first_na5_0_, authors0_.last_name as last_nam6_0_ from authors authors0_ where (authors0_.first_name like ?) and authors0_.last_name=? and authors0_.email=? limit ?, ? no paging dynamic query cost time :148

很明显可以看出来，我们对底层源码的重写生效了，这个重写方案成功地解决了动态查询时，Slice分页一定走总量统计的问题。

【本文地址】

Spring Data JPA 百万级数据量动态分页查询的优化

Spring Data JPA 百万级数据量动态分页查询的优化

今日新闻

推荐新闻